AITopics | size increase

Collaborating Authors

size increase

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

4533e4a352440a32558c1c227602c323-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 04:22:28 GMT

flatmap, proportionality, voxelwise, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (0.51)

Add feedback

Collegial Ensembles

Neural Information Processing SystemsDec-24-2025, 17:32:33 GMT

Modern neural network performance typically improves as model size increases. A recent line of research on the Neural Tangent Kernel (NTK) of over-parameterized networks indicates that the improvement with size increase is a product of a better conditioned loss landscape. In this work, we investigate a form of over-parameterization achieved through ensembling, where we define collegial ensembles (CE) as the aggregation of multiple independent models with identical architectures, trained as a single model. We show that the optimization dynamics of CE simplify dramatically when the number of models in the ensemble is large, resembling the dynamics of wide models, yet scale much more favorably. We use recent theoretical results on the finite width corrections of the NTK to perform efficient architecture search in a space of finite width CE that aims to either minimize capacity, or maximize trainability under a set of constraints. The resulting ensembles can be efficiently implemented in practical architectures using group convolutions and block diagonal layers. Finally, we show how our framework can be used to analytically derive optimal group convolution modules originally found using expensive grid searches, without having to train a single model.

collegial ensemble, electronic proceedings, name change, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

We show that weights learned by AdamW can be understood as an exponential moving average (EMA) of recent updates. This gives critical insights for how to set the weight decay in AdamW, and how the weight decay should scale with model and dataset size. In particular, the key hyperparameter for an exponential moving average is the EMA timescale. Intuitively, the EMA timescale can be understood as the number of recent iterations the EMA averages over. Given a fixed learning rate, there is a one-to-one mapping from the EMA timescale to the usual weight decay hyperparameter. Thus, choosing an EMA timescale implicitly sets the weight decay. Importantly, there are natural guidelines for sensible values for the EMA timescale: we need to average over all datapoints, so the EMA timescale should not be (much) smaller than 1 epoch, and we need to forget early updates, so the EMA timescale should not be (much) bigger than the total number of training epochs. In our experiments, we find that optimal EMA timescales are consistent with these guidelines, as are the hyperparameters chosen in recent large-scale LLM pretraining runs (e.g.\ Llama 1+2 and Stable LM). Critically, these guidelines suggest that the optimal EMA timescale should not change (much) as we scale the model and dataset. That implies that as the dataset size increases, the optimal weight decay should fall. Moreover, as the model size increases, the optimal weight decay should also increase (if we follow the muP recommendation for scaling the learning rate).

epoch, timescale, weight decay, (16 more...)

arXiv.org Artificial Intelligence

2405.13698

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Reservoir Computing with Magnetic Thin Films

Dale, Matthew, Griffin, David, Evans, Richard F. L., Jenkins, Sarah, O'Keefe, Simon, Sebald, Angelika, Stepney, Susan, Torre, Fernando, Trefzer, Martin

arXiv.org Artificial IntelligenceOct-30-2023

Advances in artificial intelligence are driven by technologies inspired by the brain, but these technologies are orders of magnitude less powerful and energy efficient than biological systems. Inspired by the nonlinear dynamics of neural networks, new unconventional computing hardware has emerged with the potential to exploit natural phenomena and gain efficiency, in a similar manner to biological systems. Physical reservoir computing demonstrates this with a variety of unconventional systems, from optical-based to memristive systems. Reservoir computers provide a nonlinear projection of the task input into a high-dimensional feature space by exploiting the system's internal dynamics. A trained readout layer then combines features to perform tasks, such as pattern recognition and time-series analysis. Despite progress, achieving state-of-the-art performance without external signal processing to the reservoir remains challenging. Here we perform an initial exploration of three magnetic materials in thin-film geometries via microscale simulation. Our results reveal that basic spin properties of magnetic films generate the required nonlinear dynamics and memory to solve machine learning tasks (although there would be practical challenges in exploiting these particular materials in physical implementations). The method of exploration can be applied to other materials, so this work opens up the possibility of testing different materials, from relatively simple (alloys) to significantly complex (antiferromagnetic reservoirs).

artificial intelligence, machine learning, reservoir, (16 more...)

arXiv.org Artificial Intelligence

2101.127

Country: North America > United States (0.46)

Genre: Research Report > New Finding (1.00)

Industry:

Media (0.70)
Leisure & Entertainment (0.70)
Materials (0.68)
Energy > Oil & Gas > Upstream (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Towards a Better Understanding of Learning with Multiagent Teams

Radke, David, Larson, Kate, Brecht, Tim, Tilbury, Kyle

arXiv.org Artificial IntelligenceJun-28-2023

While it has long been recognized that a team of individual learning agents can be greater than the sum of its parts, recent work has shown that larger teams are not necessarily more effective than smaller ones. In this paper, we study why and under which conditions certain team structures promote effective learning for a population of individual learning agents. We show that, depending on the environment, some team structures help agents learn to specialize into specific roles, resulting in more favorable global results. However, large teams create credit assignment challenges that reduce coordination, leading to large teams performing poorly compared to smaller ones. We support our conclusions with both theoretical analysis and empirical results.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2306.16205

Country:

North America > United States > New York (0.04)
North America > Canada > Ontario (0.04)

Genre: Research Report > New Finding (0.88)

Industry:

Education (0.68)
Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.67)

Add feedback

READ: Recurrent Adaptation of Large Transformers

Wang, Sid, Nguyen, John, Li, Ke, Wu, Carole-Jean

arXiv.org Artificial IntelligenceMay-24-2023

Fine-tuning large-scale Transformers has led to the explosion of many AI applications across Natural Language Processing and Computer Vision tasks. However, fine-tuning all pre-trained model parameters becomes impractical as the model size and number of tasks increase. Parameter-efficient transfer learning (PETL) methods aim to address these challenges. While effective in reducing the number of trainable parameters, PETL methods still require significant energy and computational resources to fine-tune. In this paper, we introduce \textbf{RE}current \textbf{AD}aption (READ) -- a lightweight and memory-efficient fine-tuning method -- to overcome the limitations of the current PETL approaches. Specifically, READ inserts a small RNN network alongside the backbone model so that the model does not have to back-propagate through the large backbone network. Through comprehensive empirical evaluation of the GLUE benchmark, we demonstrate READ can achieve a $56\%$ reduction in the training memory consumption and an $84\%$ reduction in the GPU energy usage while retraining high model quality compared to full-tuning. Additionally, the model size of READ does not grow with the backbone model size, making it a highly scalable solution for fine-tuning large Transformers.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2305.15348

Country: Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Calibration Assessment and Boldness-Recalibration for Binary Events

Guthrie, Adeline P., Franck, Christopher T.

arXiv.org Machine LearningMay-9-2023

Probability predictions are essential to inform decision making in medicine, economics, image classification, sports analytics, entertainment, and many other fields. Ideally, probability predictions are (i) well calibrated, (ii) accurate, and (iii) bold, i.e., far from the base rate of the event. Predictions that satisfy these three criteria are informative for decision making. However, there is a fundamental tension between calibration and boldness, since calibration metrics can be high when predictions are overly cautious, i.e., non-bold. The purpose of this work is to develop a hypothesis test and Bayesian model selection approach to assess calibration, and a strategy for boldness-recalibration that enables practitioners to responsibly embolden predictions subject to their required level of calibration. Specifically, we allow the user to pre-specify their desired posterior probability of calibration, then maximally embolden predictions subject to this constraint. We verify the performance of our procedures via simulation, then demonstrate the breadth of applicability by applying these methods to real world case studies in each of the fields mentioned above. We find that very slight relaxation of calibration probability (e.g., from 0.99 to 0.95) can often substantially embolden predictions (e.g., widening Hockey predictions' range from .25-.75 to .10-.90)

artificial intelligence, bayesian inference, machine learning, (17 more...)

arXiv.org Machine Learning

2305.0378

Country:

North America > United States > Virginia (0.04)
North America > United States > New York (0.04)
North America > United States > Michigan > Wayne County (0.04)

Genre: Research Report > Experimental Study (0.69)

Industry:

Health & Medicine (1.00)
Leisure & Entertainment > Sports > Hockey (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.88)

Add feedback

Speech Recognition With No Speech Or With Noisy Speech Beyond English

Krishna, Gautam, Tran, Co, Han, Yan, Carnahan, Mason, Tewfik, Ahmed H

arXiv.org Machine LearningJun-17-2019

In this paper we demonstrate continuous noisy speech recognition using connectionist temporal classification (CTC) model on limited Chinese vocabulary using electroencephalography (EEG) features with no speech signal as input and we further demonstrate single CTC model based continuous noisy speech recognition on limited joint English and Chinese vocabulary using EEG features with no speech signal as input.

artificial intelligence, recognition, speech recognition, (13 more...)

arXiv.org Machine Learning

1906.08045

Country:

North America > United States > Texas > Travis County > Austin (0.14)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Diagnostic Medicine (0.67)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.35)

Technology: Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)

Add feedback